Repeatability and reproducibility of a handheld quantitative G6PD diagnostic

Background The introduction of novel short course treatment regimens for the radical cure of Plasmodium vivax requires reliable point-of-care diagnosis that can identify glucose-6-phosphate dehydrogenase (G6PD) deficient individuals. While deficient males can be identified using a qualitative diagnostic test, the genetic make-up of females requires a quantitative measurement. SD Biosensor (Republic of Korea) has developed a handheld quantitative G6PD diagnostic (STANDARD G6PD test), that has approximately 90% accuracy in field studies for identifying individuals with intermediate or severe deficiency. The device can only be considered for routine care if precision of the assay is high. Methods and findings Commercial lyophilised controls (ACS Analytics, USA) with high, intermediate, and low G6PD activities were assessed 20 times on 10 Biosensor devices and compared to spectrophotometry (Pointe Scientific, USA). Each device was then dispatched to one of 10 different laboratories with a standard set of the controls. Each control was tested 40 times at each laboratory by a single user and compared to spectrophotometry results. When tested at one site, the mean coefficient of variation (CV) was 0.111, 0.172 and 0.260 for high, intermediate, and low controls across all devices respectively; combined G6PD Biosensor readings correlated well with spectrophotometry (rs = 0.859, p<0.001). When tested in different laboratories, correlation was lower (rs = 0.604, p<0.001) and G6PD activity determined by Biosensor for the low and intermediate controls overlapped. The use of lyophilised human blood samples rather than fresh blood may have affected these findings. Biosensor G6PD readings between sites did not differ significantly (p = 0.436), whereas spectrophotometry readings differed markedly between sites (p<0.001). Conclusions Repeatability and inter-laboratory reproducibility of the Biosensor were good; though the device did not reliably discriminate between intermediate and low G6PD activities of the lyophilized specimens. Clinical studies are now required to assess the devices performance in practice.

laboratories with a standard set of the controls. Each control was tested 40 times at each laboratory by a single user and compared to spectrophotometry results.
When tested at one site, the mean coefficient of variation (CV) was 0.111, 0.172 and 0.260 for high, intermediate, and low controls across all devices respectively; combined G6PD Biosensor readings correlated well with spectrophotometry (r s = 0.859, p<0.001). When tested in different laboratories, correlation was lower (r s = 0.604, p<0.001) and G6PD activity determined by Biosensor for the low and intermediate controls overlapped. The use of lyophilised human blood samples rather than fresh blood may have affected these findings. Biosensor G6PD readings between sites did not differ significantly (p = 0.436), whereas spectrophotometry readings differed markedly between sites (p<0.001).

Conclusions
Repeatability and inter-laboratory reproducibility of the Biosensor were good; though the device did not reliably discriminate between intermediate and low G6PD activities of the lyophilized specimens. Clinical studies are now required to assess the devices performance in practice.

Author summary
Novel treatment regimens for the radical cure of P. vivax malaria are more effective than current options but require prior quantitative G6PD testing. The reference method for quantitative G6PD measurement is spectrophotometry but, due to its operational characteristics, is not suitable for routine use. Furthermore, poor inter-laboratory reproducibility of spectrophotometry has prevented quantitative global definitions of G6PD deficiency. SD Biosensor (ROK) have developed a novel handheld "Biosensor" device (G6PD STANDARD), which measures G6PD activity within two minutes and has operational characteristics suited to point of care diagnosis. Reported accuracy of the Biosensor against spectrophotometry is around 90%, but its reproducibility remains unknown. This article reports the reproducibil-

Introduction
The 8-aminoquinolines primaquine and tafenoquine are the only drugs currently on the market with hypnozoitocidal properties, important for the clearance of Plasmodium vivax and P. ovale from the human host [1][2][3]. Well tolerated in the majority of recipients, 8-aminoquinolines are strong oxidants that can cause hemolysis in individuals with low activity levels of the glucose-6-phosphate dehydrogenase enzyme (G6PD), known as G6PD deficiency (G6PDd) [4,5]. The G6PD gene is located on the X-chromosome, males are either hemizygous deficient or normal, whereas females are homozygous deficient, normal, or heterozygous for the gene.
Heterozygous females have two distinct red blood cell (RBC) populations, G6PD normal and G6PD deficient, that circulate in a ratio determined through the random process of lyonization [6]. Therefore, the G6PD activity levels of heterozygous females-and their associated hemolytic risk-is dependent on the proportion of deficient cells, those cells at greatest risk of drug induced hemolysis. Approximately 400 million people worldwide are affected by G6PDd, with allele frequencies reaching up to 35% in malaria endemic areas [7,8]. Accordingly the WHO recommends routine G6PD testing prior to radical cure (schizontocidal and hypnozoitocidal treatment) with primaquine, whenever possible [9]. A 14-day course of primaquine is prescribed to patients with more than 30% G6PD enzyme activity, while eight weekly doses are recommended in patients with less than 30% activity [9]. These long treatment courses affect treatment adherence, and lead to lower effectiveness [10,11]. Short course, high dose primaquine treatment regimens, as well as a single dose tafenoquine treatment regimen, are likely to improve effectiveness, however these will require more stringent criteria to protect those at risk of hemolysis [12,13]. While qualitative G6PD diagnostics have good discriminatory power at the 30% activity threshold, they cannot discriminate patients at higher G6PD activity levels [14]; to date this can only be done by quantitative spectrophotometry [15,16]. Not only is spectrophotometry logistically unsuitable for supporting case management of P. vivax patients in remote areas where most patients live, spectrophotometry has also been shown to exhibit significant variability in its measurements [17]. For example, definitions of 100% G6PD enzyme activity in U/gHb differ significantly between studies [15]. Given the current definitions of "G6PD deficient" (<30% of normal G6PD levels) and "intermediate" (30-70% or 30%-80% of normal levels), this leads to different diagnostic cut-offs between areas [18,19]. Comparisons of G6PD activity of standardized quality control samples show significant variation between laboratories, suggesting that at least some of the variability observed in population-level G6PD readings may be due to the spectrophotometric assay itself [15]. This diagnostic variability confounds the definition of global absolute cut-offs for case management. The consequence of this assay-derived variability is that site and assay specific G6PD baseline (100% activity) levels need to be established before local deficient and intermediate thresholds can be set, adding significant complexity to the roll out of G6PD testing in P. vivax endemic settings [18]. G6PD levels are affected by RBC density [20], any G6PD measurement therefore needs to be normalized by an Hb reading and this may also contribute to the observed variability in spectrophotometry reading.
A hand-held quantitative G6PD diagnostic has been developed by SD Biosensor (STAN-DARD G6PD test, Suwon-si, ROK), hereafter referred to as the "Biosensor". The device consists of the Biosensor and a single use test strip that is inserted into the Biosensor. To generate a reading, 10μl of blood are added to a lysis buffer, 10μl of the blood buffer solution are then added to the single use test strip inserted into Biosensor. The test strip contains 5-bromo-4-chloro-3-indolyl-phosphate (BCIP) that is reduced to violet nitro blue tetrazolium (NBT) in the presence of the G6PD enzyme, the color intensity is directly proportional to G6PD activity and is measured through reflectance photometry. The Biosensor device quantifies Hb concentration using a photo-reflectance based algorithm informed by the sample's color intensity. This is measured on a separate spot to that for the G6PD activity. The handheld device displays G6PD activity (in U/gHb) and hemoglobin (Hb) levels (in g/dL) two minutes after applying the blood buffer solution, however the manufacturer indicates that results cannot be considered if Hb readings are equal to or below 7g/dL. Field evaluation studies showed the Biosensor to have an accuracy of approximately 90% in identifying intermediate and deficient individuals when compared to spectrophotometry [21][22][23]. Since ease of use, time to diagnosis, and logistics and operational feasibility are preferable to spectrophotometry, the Biosensor has the potential to provide a quantitative G6PD measurement at the bed side and support point-of-care diagnosis and treatment decisions [17]. The aim of this study was not to assess accuracy, but to determine the Biosensor's repeatability (assay precision when repeated under constant conditions) and reproducibility (assay precision under different conditions, such as across devices, operators and sites), since robust performance of these characteristics is necessary for rolling-out universal Biosensor thresholds for clinical decisions [24].

Ethics statement
IRB approvals or waivers were obtained from each participating institution prior to conducting the laboratory study. Risks to the technicians of using reconstituted human blood controls were discussed during the training and minimized by involving only technicians experienced in Good Laboratory Practice. Since no participants were enrolled, no informed consent was collected (S1 Table).

Overview
Despite the Biosensor and reference method spectrophotometry being developed for fresh blood we had to use lyophilized and reconstituted standardized controls instead in order to ensure identical samples were used throughout the study period [21]. The study comprised of two phases (Fig 1). In the first phase (Phase A) baseline repeatability and inter-device reproducibility were defined under identical conditions. Ten Biosensors were tested repeatedly in parallel in a single laboratory by a single technician using commercial controls with a range of G6PD enzyme activity levels classified by the manufacturer as "High", "Intermediate", and "Low". Each control was tested 20 times over the course of five days with each Biosensor device and was tested in parallel by spectrophotometry and Hemocue. The study would only proceed to the second phase if mean repeatability of all Biosensors met or exceeded minimal requirements (see "statistical analysis" below). In the second phase (Phase B) reproducibility was assessed by shipping each device to a different, well-established laboratory. At each site, an identical set of controls was tested 40 times over the course of 10 days (120 measurements in total / site) by Biosensor and each reference assay, spectrophotometry and Hemocue. Reference methods were standardized as detailed below, and standard operating procedures were followed across all sites.
Control samples. To ensure that identical samples were tested across all sites, commercial controls were used, with all controls within one phase being from the same lot (Analytical Control Systems, Inc., Indiana, USA; S2 Table). ACS controls are routinely used to monitor quality of reference G6PD testing by spectrophotometry. They are derived from whole blood obtained from human donors in FDA licensed centers and pooled to represent high, intermediate, or low G6PD activity (Cat. Nos.: HC-108, HC-108IN, and HC-108DE respectively). ACS provide lot specific G6PD activity range guides and Hb estimates, these are based on automated spectrophotometry estimates conducted by Pointe-Scientific on the reconstituted controls (32 vials in total). ACS recommend that laboratories develop their own in-house ranges but provides guideline ranges per control category as well. We were unable to establish spectrophotometry-based ranges applicable to all sites due to the inherent site-specific variability of spectrophotometry (15), we therefore considered the manufacturer recommended ranges. ACS controls were provided in lyophilized form and reconstituted in each laboratory at standardized intervals. All reconstituted controls were stored at 4˚C and used within two days.
Biosensor. The SD Biosensor STANDARD G6PD test was performed following the manufacturer instructions. The assay uses single-use test strips (Cat. No. 02G6S10) that are inserted into the Biosensor (Cat. No. 02GA10). All test strips used in this study were from the same manufacturing lot (S2 Table). In brief, 10μl of the reconstituted control sample were mixed with the assay extraction buffer, then 10μl of the control-buffer solution were added to a single use test device that had already been inserted into the Biosensor. Sample transfer devices supplied with the Biosensor test, known as "Ezi tubes", were used at each step. The displayed normalized G6PD activity and Hb readings were recorded once the measurement was completed after the 2 mins running period.
Each Biosensor's functionality was checked daily with a multi-use STANDARD G6PD check strip, and sample testing only commenced if the quality control check was completed successfully. Every five days each Biosensor was also quality controlled with reconstituted control samples provided by SD Biosensor ("level 1" and "level 2"; Cat. No. 02G6C10). If results were outside of the recommended ranges for G6PD activity or Hb reading, the quality control

PLOS NEGLECTED TROPICAL DISEASES
testing was repeated. If three consecutive quality control readings were outside the recommended ranges, testing with the respective Biosensor device was aborted.
Spectrophotometry and Hemocue. Spectrophotometry was performed using kits from Pointe Scientific (Michigan, USA; Cat. No. G7583) according to the manufacturer's recommendations. The brand of spectrophotometer varied between laboratories, but all instruments were temperature-controlled, cuvette based, and measured absorption at 340 nm. Sample absorbance was measured at 0 and 5 minutes at 37˚C and the difference in absorbance was used to calculate G6PD activity in U/dL following a standard formula provided by the Pointe-Scientific assay manufacturer. Measurements were run in duplicate (i.e. the sample reaction was divided into 2 separate cuvettes and measured independently) and the mean of the two G6PD results was recorded. If the coefficient of variation of the two measures exceeded 15% (CV>0.15), a third measurement was required. G6PD activity was then normalized by a Hb reading (Hemocue 201, 301 or 801, Angelholm, Sweden) to generate G6PD activity in U/gHb. Hemocue devices were used following manufacturer instructions, and the Hb reading was recorded separately.
Training. All technicians conducting the experiments had at least a bachelor's degree or higher, several years' experience of working in a laboratory and familiarity with spectrophotometry, however not necessarily with the G6PD assay used in this study. Each technician received standardized training in an online session and had to pass a Biosensor proficiency test prior to conducting the experiments. In each laboratory, a single technician performed the analyses with each diagnostic across all study testing days.
Statistical analysis. Data were recorded on standardized forms and then transferred to an Excel database (Microsoft Corp, Washington, USA), with standard data entry cross-checks. Analysis was undertaken using Stata version 15 (Stata Corp, College Station, Texas, USA).
Depending on data distribution, summary findings were displayed as mean or median with 95% confidence intervals or interquartile range (IQR) respectively. Repeatability and reproducibility were assessed by linear, random effects, and mixed effects regression models as appropriate, and by calculating coefficients of variation (CV). Spearman's Rank coefficient (r s ) was calculated to determine the correlation between Biosensor and reference method. Absolute differences between experimental (Biosensor) and reference assays (spectrophotometer and Hemocue) were assessed by Bland Altman plots and the Wilcoxon matched pairs signed rank test. The study only progressed to Phase B if the mean CV for the High control sample across all devices was less than 0.150 [25]. Combined mean difference and correlation coefficients were calculated for Phase A where one spectrophotometry reading served as reference for all ten devices. In Phase B findings were not combined since each site performed their own reference measurement and the reference method for G6PD activity is known to show significant variation, not allowing for a direct comparison [15]. Following Bonferroni correction, the level of significance was set at p<0.005 whenever multiple comparisons were done.

Phase A
Results from the ten Biosensor devices tested with High and Intermediate controls were available for five consecutive days, and Low controls for four days due to limited stocks of same-lot Low controls.
Hb-normalized G6PD activities did not differ significantly between Biosensor devices (Fig 2 and S3 Table, p = 1.000). However, Hb readings differed significantly (p<0.001, adjusted R 2 = 0.121). Compared to device 1 (baseline), devices 3 and 9 had significantly lower Hb readings with a difference of 0.477 g/dL (p = 0.007) and 0.623 g/dL (p<0.001) respectively, while device 5 had a significantly higher Hb result with a difference of 0.665 g/dL (p<0.001) when comparing readings from all three controls (Fig 3 and S3 Table).
Each run included a spectrophotometer measurement which was matched with a result from each of the 10 Biosensor devices (Fig 1). The Hb-normalized G6PD activity readings of the Biosensor and spectrophotometry were positively correlated across all three control categories (r s = 0.859, p<0.001), however median readings differed significantly for Low (mean difference: -0.1 U/gHb, 95% limit of agreement [95%LoA]: -0.8 to 0.5, p<0.001) and Intermediate controls (mean difference: -1.2 U/gHb, 95%LoA: -2.3 to -0.1, p<0.001) while median activities did not differ significantly for High controls (mean difference: -0.1 U/gHb, 95%LoA: -2.3 to 2.1, p = 0.554). Hb readings from the Biosensor and Hemocue showed a significant correlation in five out of 10 devices (p<0.005; Table 1).
Median readings of High, Intermediate, and Low controls were distinct by spectrophotometry (High vs. Intermediate: p<0.001 and Intermediate vs. Low: p<0.001), but while median readings by Biosensor also differed significantly between all three control categories (all p<0.001) six Intermediate readings overlapped with 122 Low results. All six Intermediate readings were generated by different devices and during different testing runs. Influential outliers generated by two spectrophotometry readings were identified visually (Fig 4).
The median CV across all Biosensor G6PD measurements for High controls was 0.111, below the pre-defined acceptability threshold of 0.150, while the CV for Hb measurement was below 0.070 for all controls ( Table 2). The study therefore proceeded to Phase B.   Table). The correlation between Biosensor and reference method was significant and positive for G6PD readings (in U/gHb), while Hb readings of five Biosensor devices did not correlate significantly with the Hemocue at the 0.5% (p<0.005) significance level (Table 3).
Mean G6PD readings by Biosensor and spectrophotometry differed significantly in five of 10 sites, while Hb readings showed a significant difference between Biosensor and Hemocue  in eight of 10 sites. Observed mean differences between Biosensor and spectrophotometry ranged from -2.6U/gHb to +1.1U/gHb across the ten devices and from -3.0 g/dL to 0.6g/dL between Biosensor and Hemocue (Table 3). Low and Intermediate controls could not be differentiated by Biosensor, in fact median readings of Intermediate controls (1.9U/gHb, IQR: 1.6-2.1) were significantly lower than median Low readings (2.2U/gHb, IQR: 2.0 to 2.4, p<0.001), a trend that was seen across all sites. In contrast High controls (8.0U/gHb, IQR 7.4-8.7) were clearly distinct from the other controls across all sites (p<0.001) (Figs 5 and S2-S5, and S4 Table).
Repeatability (within-site assay precision, as measured by CV) varied more between sites for spectrophotometry than the Biosensor (Fig 5). Site-level CVs of the Biosensor for the High control ranged from 0.103 to 0.125 (SD: 0.009), while spectrophotometry results ranged from 0.050 to 0.137 (SD: 0.043) (S5 Table).

Comparing Phases A and B
The CVs between Phase A and Phase B did not differ significantly (p = 0.201). The Intermediate and High controls in Phase A and B were from the same manufacturing lots so Biosensor readings could be compared directly (S2 Table). G6PD activities differed significantly between Phase A and B (p<0.001) and while Intermediate control readings in Phase A were significantly higher for eight of the 10 devices, the difference did not exceed 0.4U/gHb. For High controls, a significant difference was observed in three devices with a maximum difference of 1.5U/gHb (Table 4).

Discussion
The reproducibility (inter-device precision) of the Biosensor did not differ significantly between devices, either when handled by the same technician (Phase A), when operated in different settings by different end users (Phase B), or when the same device was handled by different operators (Phase A vs B). In contrast there was significant variation when G6PD activity was measured by spectrophotometry between sites despite standardized controls and procedures, a phenomenon that has been reported previously [15].
Four out of the ten participating sites had not used the Biosensor previously. Following standardized online training, all sites were able to generate G6PD measurements with good precision that did not differ significantly between sites. Precision of the spectrophotometry results was more variable between sites, with some sites exceeding Biosensor repeatability while others had lower precision. There was a good correlation between the Biosensor and spectrophotometry results when these were assessed in a single lab in Phase A. However, while spectrophotometry could discriminate reliably between the three control types, the Biosensor results less clearly distinguished Low from Intermediate controls, with six of the 200 repeat measurements overlapping. Correlation between Biosensor and spectrophotometry was lower in Phase B when devices were assessed in different laboratories due to the variability of the spectrophotometry. In Phase B the Biosensor did not distinguish between Low and Intermediate controls. In fact, the results were significantly lower for Intermediate compared to Low controls and this was consistent across all sites; in contrast spectrophotometry in Phase B was able to distinguish between all three control categories.
Besides G6PD activity, the Biosensor also measures and displays Hb concentration. The repeatability of Hb measurements by Biosensor was better than by Hemocue, with best interdevice repeatability observed when either device was operated by a single user. Hb readings of both devices correlated poorly, not least since the recommended Hb point estimates for all three controls were very similar. Absolute pooled readings for the Biosensor were 1.8g/dl Table 4. Mean difference in G6PD activity for Intermediate and High controls by each Biosensor device between Phases A and B. Low controls were not directly comparable between Phases as these were from different lots. PLOS NEGLECTED TROPICAL DISEASES lower in Phase A compared to paired readings of the Hemocue, however readings from the Biosensor were closer to the recommended point estimate suggested by the ACS manufacturer. Determining accuracy of Biosensor Hb readings against the Hemocue reference assay with reconstituted lyophilised controls is of limited clinical relevance. A study from the US compared paired Hb measurements from fresh, venous, samples by Biosensor and Hemocue (model 201+) and found the mean difference to be 1.0g/dL [22]; a study comparing Biosensor Hb readings from venous blood samples to the results of a complete blood count (CBC), found readings to differ by 0.4g/dL [21], and a recent study from Brazil found the mean difference again to be less than 1 g/dl [23]. Our findings have several limitations. G6PD activity and Hb levels were measured in commercial lyophilised controls to ensure cross-laboratory standardisation, but the Biosensor and reference assays are developed for testing fresh venous or capillary blood. Stabilizing agents contained in the controls may have affected the Biosensor and/or reference method and this effect may differ by assay. Repeatability and reproducibility of each assay should not have been impacted by this difference, but it may have affected accuracy of either device which is best assessed using fresh blood samples [21,22]. While providing reference ranges, the manufacturer ACS suggests developing in house reference ranges for all controls. We were unable to establish spectrophotometry-based ranges applicable to all sites due to the inherent site-specific variability of spectrophotometry [15] and instead considered the ranges provided. This approach likely explains why G6PD readings generated by Biosensor and spectrophotometry were below the ACS manufacturer's recommended range and Hb readings by either assay did not match the guideline point estimates. This was consistent across different devices when assessed by a single user and by different laboratories. Unfortunately, the supplier was unable to provide additional controls from the same lots for further testing to clarify this issue. The observed narrow activity ranges meant that there was little difference in activity levels between the Intermediate and Low controls, limiting the activity range that was assessed. Two spectrophotometry readings for Intermediate and High controls appeared to be outliers in Phase A, both of which were included since readings were within the recommended range and this may also have reduced the correlation between assays and the derived absolute difference. Although sites used different Hemocue models (Hemocue 201, 301 and 801), results from all devices were pooled which may have resulted in an increase in variability for the Hb reference. Finally, all measurements were done by highly qualified technicians in a research setting, not reflective of a real-world scenario, accordingly reproducibility of the Biosensor may be lower when used in a clinical setting.

Device Intermediate: mean difference in U/gHb
The precision of the Biosensor demonstrated in this study, and the good accuracy reported from field and other evaluation studies [21][22][23]26], indicate that the Biosensor could be a valuable quantitative point-of-care diagnostic; however, we found that spectrophotometry, when performed well, remains the gold standard with precision superior to the Biosensor. The reproducibility observed in this study indicates that the technology is likely to permit direct comparison of results generated by different Biosensor devices and trained users [15]. If confirmed in clinical settings, the Biosensor has the potential to be an important tool to facilitate the broader roll out of 8-aminoquinoline radical cure. Clinical data will be important to further investigate the poor discriminatory power of the Biosensor at low and intermediate G6PD activities observed with the lyophilised samples, however given that the Biosensors' most probable designation will be to distinguish G6PD normal individuals from those with less than normal activity (at a cut-off of 70% activity for Tafenoquine) the observed poor discriminatory power at lower activities is unlikely to be of significant practical relevance. Finally, it will be important to verify whether the observed precision demonstrated here is maintained when the device is operated under routine conditions and in anaemic patients, as well as to define training requirements for intended users at the point-of-care. In conclusion, our findings suggest that the Biosensor offers reproducible quantitative diagnosis of G6PD status at the pointof-care in the hands of well-trained technicians. If repeatability and reproducibility as well as the previously reported accuracy are confirmed under real life conditions, the Biosensor has the potential to simplify access to effective radical cure of P. vivax malaria.  contained herein are the private views of the authors and are not to be construed as official or as reflecting true views of the Department of the Army or the Department of Defense. The investigators have adhered to the policies for protection of human subjects as prescribed in AR 70 to 25.